An Examination of Audio-visual Fused Hmms for Speaker Recognition
نویسندگان
چکیده
Fused hidden Markov models (FHMMs) have been shown to work well for the task of audio-visual speaker recognition, but only in an output decision-fusion configuration of both the audioand video-biased versions of the FHMM structure. This paper looks at the performance of the audioand video-biased versions independently, and shows that the audio-biased version is considerably more capable for speaker recognition. Additionally, this paper shows that by taking advantage of the temporal relationship between the acoustic and visual data, the audio-biased FHMM provides better performance at less processing cost than best-performing output decision-fusion of regular HMMs.
منابع مشابه
A New Approach to Integrate Audio and Visual Features of Speech
This paper presents a novel fused-hidden Markov model (fused-HMM) to integrate the audio and visual features of speech. In this model, audio and visual HMMs built individually are fused together using a general probabilistic fusion method, which is optimal in the maximum entropy sense. Specifically, the fusion method uses the dependencies between the audio hidden states and the visual observati...
متن کاملFused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition
A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper we show that for audio-visual speech recognition (AVSR), FHMMs can be adopted as a novel method of training synchronous MSHMMs. MSHMMs, as proposed by several authors for use in AVSR, are jointly trained on both the ...
متن کاملFused HMM adaptation of synchronous HMMs for audio-visual speaker verification
A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper, we will show that instead of being treated as separate modelling technique, FHMMs can be adopted as a novel method of training synchronous hidden Markov models (SHMMs). SHMMs are traditionally jointly trained on bot...
متن کاملAudio-Visual Speaker Veri cation using Continuous Fused HMMs
This paper examines audio-visual speaker veri cation using a novel adaptation of fused hidden Markov models, in comparison to output fusion of individual classi ers in the audio and video modalities. A comparison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classi ers in both modalities under output fusion shows that the choice of audio classi er is more important than vid...
متن کاملAsynchrony modeling for audio-visual speech recognition
We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In this paper, we consider such models synchronized at the phone boundary level, allowing various de...
متن کامل